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Abstract 

Emerging applications in engineering such as crowd-sourcing and (mis)information propagation 
involve a large population of heterogeneous users or agents in a complex network who strategically make 
dynamic decisions. In this work, we establish an evolutionary Poisson game framework to capture the 
random, dynamic and heterogeneous interactions of agents in a holistic fashion, and design mechanisms 
to control their behaviors to achieve a system-wide objective. We use the antivirus protection challenge 
in cyber security to motivate the framework, where each user in the network can choose whether or 
not to adopt the software. We introduce the notion of evolutionary Poisson stable equilibrium for the 
game, and show its existence and uniqueness. Online algorithms are developed using the techniques of 
stochastic approximation coupled with the population dynamics, and they are shown to converge to the 
optimal solution of the controller problem. Numerical examples are used to illustrate and cotToborate 
our results. 


I. Introduction 

Emerging engineering applications such as social networks Q. crowdsourcing Q, Q and the 
Internet of Things (loTs) Q involve a large population of heterogeneous devices or users. These 
agents interact with each other in a complex environment, in which each agent makes strategic 
and dynamic decisions in response to the group of agents it interacts with. The group of agents 
can be random and changing over time. One illustrative example is 5G wireless communication 
networks [|^. Each mobile can communicate with a number of heterogeneous devices at different 
times, and makes an investment decision on antivirus software. This situation is also analogous 
to the epidemic spread of influenza in which individual person makes a decision on vaccination. 
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The objective from the perspective of the system designer or government agency is to control the 
large population behaviors, and induce desirable outcome that is conducive for the sustainable 
growth of the population. In order to address this issue, the first step of this research is to establish 
an integrated system framework that allows capturing the random, dynamic and heterogeneous 
features of the population. 

One useful tool to describe the dynamic evolution of population is evolutionary game theory 
0 . 0 . which often assumes homogeneous and pairwise interactions between agents. This under¬ 
lying assumption makes the classical framework insufficient to capture the network properties 
of the agents, and the heterogeneous local interactions among the players. In this paper, we 
develop an evolutionary Poisson game framework which bridges the gap between the evolutionary 
game theory with the heterogeneity of the population. We enrich the game-theoretic model by 
incorporating network topology, the size of the population, and the epidemic process to establish 
a holistic framework that can be used to address the engineering applications of interest. These 
unique aspects of the model lead to a customized evolutionary stability equilibrium concept, and 
its corresponding replicator dynamics for describing the evolution of the population. 


TYPES 



Fig. 1. Feedback system between controller and the evolutionary Poisson game model. Each circle represents a group of agents 
who interact through a network. Each agent is denoted by its type. The controller observes the population states and controls 
the population to achieve a global objective. 
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The overarching goal of this work is to control the behaviors of the large population to 
achieve a system-wide objective. Building on the evolutionary Poisson game framework, we 
leverage the techniques of stochastic approximation to develop an online learning algorithm. 
Fig. illustrates the interdependencies between the game-theoretic model and the controller. 
The controller observes the population states and inputs a control to drive the system to a global 
optimum. 

The convergence analysis of the controller involves the understanding of the coupled dynamics 
between the population and the learning algorithm. We show that the convergence is guaranteed 
under time scale separation and mild conditions on the step sizes. In addition, we use virus 
protection over a large-scale network as a motivating application to illustrate the link between 
the game-theoretic model and the application. We fully characterize the global control of the virus 
protection problem in network systems, and corroborate the results with numerical examples. 
We observe a phenomenon of heterogeneity induced confidence in which the protection rate 
decreases as heterogeneity of the population exceeds a certain threshold. 


A. Related Work 

Large population behaviors have been investigated using models from evolutionary games 
0-0. Poisson games p0| , pT| , mean-field games p^-pA|. These models have successfully 
captured different aspects of the large population. The evolutionary Poisson game developed 
in this work integrates the features of evolutionary games and Poisson games to form a more 
powerful framework to analyze and control systems with a large population of agents. 

From the perspective of applications, the optimal protection problem and epidemics spreading 


over networks has been recently investigated in [15|-[18|. The existing models are not sufficient 
yet to incorporate the topological information into a holistic epidemic and game model simul¬ 
taneously. In this work, we aim to address this issue by proposing an integrated framework that 
can be used to broaden the scope of the applications and capture more pertinent features of the 
problem. 


B. Organization of the Paper 

The paper is organized as follows: In section |I^ we describe the system model, and we develop 


in section III the evolutionary stable equilibrium concept and its associated replicator dynamics. 
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In Section we present a global control problem, and we provide in section |V] explicit results 
that completely characterize a class of virus protection problem. Finally, we conclude the paper 
in section IV^ and discuses future work. Due to the page limit of the paper, we remove the proofs 


of the results. The details of the proofs can be found in [19|. 


II. System Model 

In this section, we introduce the large population game model, and discuss an application of 
epidemic protection in an heterogeneous population. 


A. Random player game 

Large scale interacting systems often involve a random number of interacting players. Poisson 
game is a natural framework to capture this phenomenon (ig, (ni, which has been successfully 
used to study decentralized resource allocation in networks A Poisson Game F is 

mathematically defined by a five-tuple (X, x, r, L, u) where: 

• X corresponds to the mean number of players, typically X >> 1, 

• r is the set of types of players and each one belongs to one type t G T, 

• The probability of a player being of type t is given by r{t), and the number of players of 
type t is a Poisson random variable with parameter Xr{t), 

• L is the set of all pure actions available to all players, 

• The utility of a player of type t is Ut{a,x) with a is the pure action, and x is a vector of 
size \C\ where x{b) is the number of players who choose action b in C- 

The expected utility of a player of type t who plays action a while the rest of the players are 
expected to play using strategy a is: 


Ui{a,a) = Y, P{x\o)ui{a,x), 

xez(c) 

where Z(C) denotes the set of elements w G such that w(c) is a non-negative integer for all 
c E C. The decomposition property of the Poisson distribution yield: 


and 




x(b) = J^r(t)Oi(b). 

teT 
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If players play according to the strategy a, a, (c) is the probability that a player of type t chooses 
the pure action c. Finally, the expected utility whether a player chooses action 0 G A(fr) is: 

f/f(0,a)= ^0(a)f/f(a,a). 
flG C 

Definition 1: The strategy o* is a pure Nash equilibrium if 

Vter, a*G5,(a*), 

with 

Blip) = {b & C '.b & argmin[/,(a,a)}. 

aeC 

We can also extend this framework to the mixed-strategy Nash equilibrium by considering the 
set of mixed best responses A(5r(a)). 


B. Application to epidemic protection in heterogeneous population 

Having defined the non-cooperative game in the context of heterogeneous interacting randomly 
individuals, we describe one application related to virus protection. Many recent work works have 
ignored the topology of the interaction, the heterogeneity of the individuals or the selfishness 
of their decision in their models. In this work, we develop a holistic framework that can 
incorporate these features. We start by introducing the Susceptible-Infected-Susceptible (SIS) 
epidemic model, which has been well-studied in the literature p2| and recently has gained lots 


of interests for modeling computer viruses propagation [23|, [24|. 

Consider an SIS epidemics over a graph, there exists a limiting spreading factor rate, denoted 
by the critical epidemic threshold, below which the infection vanishes exponentially fast in time, 
and above which the critical threshold the network stays infected. In an in-homogeneous SIS 
epidemics, we can express the epidemic threshold of an individual effective spreading rate x, of 


each node i. Indeed, it is has been shown in [251 that by using a mean-field approximation of 
the Markov process model for the epidemic, for the complete graph structure with N nodes, the 
critical threshold thus satisfies the following relation: 


N 


1 


1=1 




= N-l. 


The contamination process of our SIS epidemic is a Poisson variable with rate |3 but our spreading 
framework is in-homogeneous as we consider that each node of type t has a recovery process 
with rate 5?. 
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Our framework is enough generie sueh that its ean be applied to the eontrol of large eomplex 
systems, sueh as virus spreading 


and information easeading [26|. In order to illustrate our 
framework, we deseribe in the next seetion, our model for the eontrolled of an in-homogeneous 
SIS over a large population in whieh the interaetion strueture is stoehastie. 

We eonsider an individual proteetion strategie game where eaeh player has a type, or private 
information whieh determines his reeovery eapability (e.g. the rate of reeovery), and ineomplete 
information (e.g. nodes are not aware of the number of players they interaet with[^. 

The set of pure aetions of the players is C = {OFF, ON}, and the set Z{C) = Players 
are eharaeterized by their reeovery rate whieh depends on their type t. Then, the effeetive 
spreading rate for eaeh type t player is Xf = ^. Based on the expression of the eritieal epidemie 
threshold for the heterogeneous SIS in a eomplete graph, we obtain a following neeessary and 
suffieient eondition over the effective spreading rates Xf and the number of nodes Xt of type t 
that do not invest, in order for the infection to propagate in a complete graph: 


T T 

Xt 


t=i t 




This inequality is equivalent to the linear constraint: 

T 


“l l+Xf 


( 1 ) 


( 2 ) 


Depending on the decision of each player, if the infection is propagated over the entire network, 
then the cost for a player that does not protect itself is K, otherwise its cost is 0. This cost 
may represent the restoring cost when a node is contaminated, or also this cost can be a penalty 
proposed by the system designer in order to control self-protection behavior. Then, the utility 
of a player of type t is given by: 


Ut{OFF,{x\,...,XT)) = 


K if 

0 otherwise. 


In our framework, the type-t utility function is not defined over the set Z((r) as in [ [TT| but as 
follows: 

Mf: (T X X ... X —)■ IR. 


xT 


'Each local interaction has a random number of players which follows a Poisson process with an average population equal 
to A.(A. >> 1), i.e., most of the interactions that occur in the population involve a large number of interacting users. 
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In fact, the utility function in a Poisson game should depend on the total number of players 
choosing the same action over different types. In the model, we do not have such aggregative 
assumption in the utility, and the type as an impact on the utility function. For the same number 
of individuals that do not invest, i.e., take the action OFF, the utility of a player depends on 
the number of such individuals of each type. We then use the concept of random player game 


model proposed in [^, which is a generalization of Poisson games. In fact, games with a 
random number of players are a natural extension of Bayesian games, which are a class of 


incomplete information games [281. 

We denote by Xt the random variable which determines the number of players of type t that 
do not invest. Based on the decomposition property of the Poisson distribution, Xt is a Poisson 
distribution with parameter 'kr{t)Ot{0FF). Then, the total number of players that do not invest 
is a Poisson distribution with parameter 'kY,tfi^)^tiOFF). If a node decides to be protected, he 
pays a cost C, i.e., 

Ut{ON, {xu...,xt)) =C. 


Note that the utility functions do not depend on the type t of the user We consider a 
symmetric Nash Equilibrium. Denote by p the probability that a player (of any type) chooses 
action OFF, i.e., for all types t = Ot{OFF) = p and Ot{0N) = I — p. The expected 

utility of a player who plays pure action OFF while all other players are expected to play 
according to a mixed strategy p depends on the realization vector x = {xi,X 2 ,. ■ ■ ,xt) of the 
random vector X = (Xi,... ,Xt) by: 


Ut{OFF,p)= Y, P{'^ = 'i^\p)ut{OFF,x) ■=U{OFF,p). 
xeiv?' 

Based on the decomposition and aggregation properties of the Poisson distribution, we arrive at 
P(X = x\p) = f\P{X, =x,\p) ^ ^ 

t=l t=l 

Then, the expected utility of a player that does not invest in protection, in face of a population 


similar analysis can be done for the case of type-dependent equilibrium ( 


29 ). 
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profile p is given by: 


U{OFF,p) = ^ P{X = x\p)u{OFF,x) 


= K 


E P(X = x|;,) 




/:|i- E ^(X = xb) 

X:Ef=,f^<l 

^|l- E t{P{X,=x,\p) 

X:E,L,^li<l'=l 


= K\l-e 


—Xp 




r(tYt 


v-yT £rlL<-1 
A.Ef=i rpc7<^ 


t=\ 


Ko: 

X;! 


Finally, the expeeted utility from playing aetion q G A{C) is given by: 

U{q,p)=qUiOFF,p) + {l-q)C. 

Based on Definition 1, a (symmetrie) mixed Nash Equilibrium p* for the proteetion game with 
a random number of player satisfies: 


\/q^p\ U{p*,p*) <U{q,p*). 


Lemma 1: If C > ^, then the pure Nash equilibrium is p* = 1. 

Proof: The proof of this lemma follows the intuition that if the eost for being proteeted C 
is higher than the eost of being infeeted K, then every user will take the risk to be infeeted. 
Indeed, if it is, the eost injured is less or equal to the eost if it was proteeted. Mathematieally 
speaking, for any population profile p the utility of an individual to be proteeted is C and we 
have that: 

U{OFF,p)=K 
Then if we have C > K, then: 

Vp, U{OFF,p) <U{ON,p). 

Thus the strategy OFF is a dominant strategy and all individuals play this aetion at equilibrium. 


£ P(X = x\p) <K. 




March 30, 2015 


DRAFT 



9 


We next state the proposition that deseribes the mixed equilibrium. 

Proposition 1 (Existence and Uniqueness): If the parameters of the system X, C, K (with 
C < K), T, the type distribution r(-) and the effeetive spreading rates Xj satisfy the following 
eondition: 


X t=l 


Xt'. 


K' 


Then, there exists one unique mixed Nash equilibrium. 
Proof: A mixed Nash equilibrium p* satisfies: 


U{pfp*)<U{q,p*), 


whieh is equivalent to: 


p* G argmaxt/(/?,/)*) = ar:gmax{pU(OFF,p*) + (1 — p)C). 

p p 


Then, we study the solution p of the equation: 


Y. =C. 


X-yT xt'^t ^1 
^■L,=l 1+x, 


-1 Xt\ 


t=\ 


whieh is equivalent to: 

T ^ r(tYi C ^ 

F(rt:= Y = (3) 

x:Er.,rSF<l '■ 

Note that both funetions are eontinuous, strietly inereasing over the interval [0,1] and also we 
have: 

G(0) = (1-^)<1=F(0). 


The funetion G(.) is strietly eonvex over the interval [0,1]. Also the same for the funetion F(.) 
whieh is a finite sum of strietly eonvex funetions and then it is a strietly eonvex funetion over the 
interval [0,1]. Then, if we assume that F(l) < G(l) there exists almost one solution of equation 
(|^ inside the interval ]0,1[. More as the left-hand side funetion is a polynom and the right-hand 
side an exponential, this equation, if it has a solution over [0,1], this solution is unique. ■ 
We have proved the existenee and uniqueness of the equilibrium depending on the parameters 
of the problem. Based on simple geometrie argument, as funetion F and G are eontinuous over 
the interval [0,1], we ean show that the equilibrium p* is strietly deereasing in C (i.e., more 
expensive is the proteetion, less individuals will adapt the strategy OFF.), and it also is strietly 


March 30, 2015 


DRAFT 





10 


decreasing with X (i.e., the average number of individuals in each local interaction). Indeed, 
having more people in average at each interaction makes individuals more vulnerable, and then 
the protection rate (i.e., proportion of individuals protected) becomes higher at the equilibrium. A 
last important remark from the previous proposition is that the strategy ON cannot be a dominant 
strategy for any values of the parameters of the model. In fact, F{0) > G(0), and there is always 
a proportion of individuals that are not protected. 


III. Evolutionary stability and dynamics 

Another equilibrium concept, which is more robust than the Nash equilibrium and well 
adapted to large population games is called Evolutionary Stable Strategies (ESS). It is based 


on evolutionary principles that have been originally defined in [30| in biology, and recently 
applied to engineering and systems 0. In this section, we will introduce the concept of ESS 
and its associated dynamics. 


A. Evolutionary stability concept 

ESS is a strategy such that, if adopted by all the players, is robust against deviations of a 
(possibly small) fraction of the population. Erom a biological point of view, it can be seen as a 
generalization of Darwin’s idea of survival of the fittest, while from a game theory perspective, 
it is a refinement of the Nash Equilibrium, which satisfies a stability property. 

It is important to note that a mixed strategy p can be also interpreted as the set of distributions 
of pure strategies among the players Q. In our setting, the mixed strategy p can describe, 
assuming that each player plays a pure action in C, the proportion of players that choose the 
action OEE. This is also called the strategy profile of the population. Having this equivalent 
point of view of the game in mind, an ESS, if adopted by the whole population, is resistant 
against mutations of a small fraction of individuals in the population. Suppose that the whole 
population adopts a strategy q, and that a fraction £ of mutants deviate to strategy p. Strategy q 
is an ESS if Vp 7 ^ q, there exists some £p > 0 such that Ve G (0,£p): 

f/(< 2 ',£p+(l -£)<?) < C/(p,£p+(l -£)<?). (4) 

In other words, this strict inequality says that an ESS defeats any small mutations (relative to £) 
of the population profile. In that sense, the equilibrium concept of ESS is said to be more robust 
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than the Nash Equilibrium, because it is robust against the deviation of a fraction of players, 
and not only one. 

Another approach to study the evolutionary stability of an equilibrium is to consider the 
dynamics of the strategies inside the population. When the utility function is bilinear, this strict 
inequality condition can be replaced by two simple conditions: Nash condition and stability 
conditions. This type of analysis which makes the game as a standard evolutionary game that has 
been proposed in Q to analyze the evolutionary stability with a random number of individuals 
at each interaction. But in our setting, the utility function is clearly not bilinear and then we 
cannot use the Nash and stability conditions instead of equation Q. 

B. Dynamics 

Another way to describe how a population reaches a stable situation is through the replicator 
dynamics, which serves to highlight the role of selection from a dynamic perspective. It is 
formalized by a system of ordinary differential equations, and it establishes that the evolution 
of the size of the populations depends on the fitness they achieve during interactions. A strategy 
will sustain if its fitness is higher than the fitness averaged over all the strategies used in the 
whole population. The/o/k theorem of evolutionary games allows to establish a strict connection 
between the stable points of the Replicator Dynamics and the Nash Equilibria Q. 

In our context, the RD equation can be formalized as follows: 

p{t) = p{t)[U{p{t),p{t))-U{OFF,p{t))], 

= p{t){l-p{t))[U{ON,p{t))-U{OFF,p{t))], 

= p{t){l-p{t))[C-U{OFF,p{t))]. (5) 

We have the results from the folk theorem of the replicator dynamics in population games 
that an interior rest point of the dynamics is a Nash equilibrium of our game. Then, we have 
another method to describe the equilibrium, as a rest point of a dynamical system. This provides 
a very important insight for a global control of the system. We will see in the next section, that 
this approach gives us the possibility to determine a two time-scale process to optimize a global 
performance of the system without computing explicitly the Nash equilibrium. Moreover, we 
prove in the next proposition that any rest point of our dynamics is not only a Nash equilibrium 
but also an ESS. In our setting, we have the following result. 
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Proposition 2: If our game has a unique mixed equilibrium p*, then it is an ESS and it 
eorresponds to the unique interior rest point of the ODE Q. 

Proof: Based on proposition Q, we first assume that the parameters of the system are sueh 
that there exists a mixed nash equiliiubrm p* and we have proved that then it is unique. More, 
this equilibrium is the unique exterior rest point of the replieator dynamies: 

p{t) =p{t){\-p{t))[C-U{OFF,p{t))]. 

We have that U{OFF,p*) = C. In order to prove that p* is an ESS, we ehoose another mixed 
strategy q adopted by a proportion £ of individuals. Then, based on equation Q, is an ESS 
if the exists an £^ sueh that Ve G (0,£g), we have: 

U{p*,ZqF{\-z)p*) < U{q,ZqF{\-z)p*). 

In faet, we prove this inequality for Zq=l. Eet first assume w.l.o.g. that 0 < q < p*. A similar 
analysis ean be done for the ease when I > q > p*. We fixed EG (0,1) and we denote pe = 
£^ + (1 — £)p*. Note that q < p^ < p* (if £ = 0 there is no mutant, so it is not an interesting 
ease). After some simple algebraie deeompositions, we have the following equivalenee: 

t/(p*,£< 5 r+(l-£)p*) < U{q,Zq+{\-z)p*), 


rewriting as: 


U{p*,Pz) < U{q,pE), 


is equivalent to 


{p*-q)iUiOFF,p,)-C)<0. 


As the mixed equilibrium p* is unique and is the unique interior rest point of the replieator 
dynamies, for all p < p* (resp. p > p*) we have that p{t) > 0 (resp. p{t) < 0). We have that 
Pe < P* and then Pe(t) > 0 whieh, based on the replieator dynamies o.d.e. ([^, means that for 
any time t\ 

C-U{OFF,p^{t)) >0^0 U{OFF,p^{t)). 


Einally, as it is try for any time t, it is also true for any value pe sueh that q < Pe< p* and then: 


{p*-q){U{OFF,p,)-C)<0, 


whieh proves that p* is an ESS. 
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The discrete-time version of the replicator dynamics is described as follows: 

Pn+l =Pn+b{n)p„{l-pn)[C-U{OFF,Pn)\. 

Let timescale parameter b{n) be 


(6) 


^b{n)=oo and ^b{n)^ <oo, 

n n 

then, as n tends to infinity, the discrete-time iteration Q is an approximation of the replicator 
dynamics ODE given by Q. 


IV. Online global control 

The overarching goal of this work is to design a global control for the entire heterogeneous, 
stochastic interacting population. As an example of global objective function for the controller 
(see Fig. [^, we consider in this section the global revenue. Moreover, we propose an online 
learning process as we assume that the infected cost K is not known by the controller. This 
parameter is highly related to how individuals evaluate the cost of being infected, or by definition, 
it is difficult to estimate it correctly. We then propose an online learning algorithm so that a 
controller can optimize a global function of the equilibrium population without knowing this 
parameter. We use the average revenue as a global function given by 

R{C)=X{l-p\C))C, 

where p* (C) is the equilibrium considering the control parameter C. Its goal is to maximize this 
function depending on the price C. Based on simulations, we observe that the equilibrium p* 
depends on the cost C as an 5-shaped function, concave and strictly convex. Particularly, for 
C high enough, i.e., C > ^, the equilibrium is strictly convex in C. Then, there exists a price 
Co such that for all C > Cq, the function p* (C) is strictly convex. Then, we have the following 
lemma. 

Lemma 2: The revenue R{C) of the provider is strictly concave for C G [Cq.K]. 

Proof: We have that R{C) = ?i(l — p*(C))C. We take the two time derivatives and then: 

i?'(C) = -V(C)-^C^(C), 

and 

«"(C) = -2lf(C)-lC^(C)<0, 
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because the equilibrium is strictly increasing with C and it is strictly convex over the interval 
[Co,C]. Then the revenue of the provider is strictly concave over the interval [Co,C]. ■ 

Based on this result, a gradient algorithm can be used to converge to then optimal price C* that 
maximized the revenue of the provider. In fact, we have existence and uniqueness of an optimal 
price. Designing a gradient algorithm in our context needs to estimate the gradient function of 
the revenue because there is no closed form expression of the equilibrium function. We then 


propose to use the following Simultaneous Perturbation Stochastic Approximation (SPSA) pT|: 

( 




dC 


6 A 




(7) 


where A is a random variable such that P(A = 1) = = —1) = i, and 5 is a small constant. 


One sample form of this approximation is proposed in [32|. However, it has been observed that 
this single form introduces significant bias, so that in general the two-sample form, given by 
equation (|7]), is considered. 

Since the equilibrium of the population game is characterized by the rest-point of the replicator 
dynamic given by equation ([^, the controller can set a new price (C-I-5A) to optimize his revenue, 
and wait for convergence of the replicator dynamics to the equilibrium. Therefore, by observing 
the equilibrium, the controller has an estimation of its new revenue and also of the gradient of 
his revenue. 

Given that for a fixed price C the replicator dynamics has a global asymptotically stable 
equilibrium p*{C), our algorithm converges to the optimal price. Considering a starting price 
Co, we then consider the following approximate gradient descent algorithm: 

'^(C„ + 5A,)-^(C,0 


Vn = 0,1,..., C„+i = Cn + a{n) 


5A„ 


C^-|-^?(n)y(Cn, A^), 

where a{n) is the update step size, a discrete random variable A„ at each step n such that 
F(A„ = l)=IP(A„ = -l) = i, and 

R{Cn)=Kl-p{Cn))Cn. 

We can then define for all n the following functions: 

h{Cn) :=^[/(C„,Ai)], 
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and 


Mn+i :=/(C„,A„+i)-/?(C„). 


Then, the approximate stoehastie gradient deseent ean be written as: 


Cn-\-i — Cfi + a{jl) \h{Cn) + 


where is a martingale differenee sequenee. The update step-size parameter a{n) has to 
be effieiently designed in order to guarantee the eonvergenee to the global optimal solution. 
Indeed, the eontrol update has to wait for the eonvergenee of the replieator dynamies to the 
equilibrium. Therefore, there is a relation between the speeds of the two dynamieal proeesses. 
For a fixed C, the equilibrium p*{C) is a global attraetor, whieh is also a global asymptotieally 
stable equilibrium of the replieator dynamies. Indeed, the SPSA is eoupled with the replieator 
dynamies.Let £ be a positive eonstant, and the replicator equation is rewritten as 

P{t) = ^p(t)(l-p(t))[C-C/(OFF,p(t))]. (8) 


By having £ small enough, the SPSA views the replicator dynamics as quasi-equilibrated while 
the replicator dynamics views the SPSA as quasi-static. Then, we can prove the convergence 
of our SPSA by viewing the underlying replicator dynamics as a two timescale dynamical 
system. For a sufficient relative speed of convergence of the replicator dynamics compared to 
the stochastic gradient descent, we can prove that our algorithm converges to the price that 
optimizes the global objective function. The step size of strategy update of the population is 
i in discrete time. Then, we have the following proposition that yields the conditions on the 
step-size update of the gradient algorithm for reaching the optimal solution. 

Proposition 3: If we have the following conditions: 


^a(n) 


OO 

5 


< OO and 

n 


1 

na{n) 


—)■ 0 , 


thenC„—)-C*, a.s. 

Proof: On one side, the approximate gradient descent algorithm follows: 


Vn = 0,1,2,..., Cn+\=Cn + a{n){- 


5A„ 


where the reward depends on the equilibrium as R{Cn) = ?i(l — p(C„))C„ and a{n) is the step 
size of the updating rule. On the other side, the replicator dynamics given by equation Q is 
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the limit of a discrete time iteration with an update stepsize Xjn. Then, the two discrete time 


iterations are coupled like in [ 331. Then, in order to have the convergence of the coupled iteration 
processes, we need the following conditions: 

1) sup„{\\Cn\\) < °°, a.S; 

2) for fixed C, the replicator dynamics o.d.e has a globally asymptotically stable equilibrium 
P’(C). 

3) the o.d.e limit of the approximate gradient algorithm has a globally asymptotically stable 
equilibrium when replacing the equilibrium by p*{C), 

4) 

y a(n) = oo^ y a(n)'^ < oo and —^ 0, 

n n na{n) 

The first point is verified as if C > ^ then /?* = !, thus we have that for all n, ||C„|| < ^ and 
then ||7?(C„)|| < XK. Thus we have: 

11/^ II 11/^ I 1 Ml 

||C„+,|| = ||C„+-( -- )||. 

, .1,.R(C, + 8A„)-.R(C„),„ 

^ + 8A„ 

||R(C„+8A„)|| + p(C„)|| 


< ||Cn|| + 

Tk 

< i^(l + ^)<oo 

O 


and then supn{\\Cn\\) < °°. The second point is verified as the replicator dynamics rest point, 
for a fixed C, is an interior ESS in our case and then a global globally asymptotically stable 
equilibrium (see Q-Theorem 7). Based on the strict concavity of the revenue function proved in 
lemmathis function has a unique maximizer, and then the o.d.e limit of the stochastic gradient 
has a global globally asymptotically stable equilibrium. This proves the third point. Finally, we 
assume that 

'ya(n)=oa y a(n)^ < oo and — 0 , 

r r na{n) 

which implies that the gradient update moves on the slower timescale than the replicator dynam¬ 
ics. Thus we have the convergence almost surely of our approximate gradient descent algorithm 
given by equation ([^ to the optimal price, i.e. C„ —C* a.s.. ■ 

The last proposition provides conditions on the time scale of the gradient algorithm to ensure 
that the coupled algorithm converges to the optimal solution of the global control problem. 
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V. Complete characterization of the virus protection global control 

PROBLEM 

In this section, we first provide a complete characterization of the solutions to the global control 
problem for the population of one single type. Second, we show the impact of heterogeneity 
and randomness of our system on the equilibrium results and the global control problem. 


A. Population with Single Type 

Consider a fully connected network of size N, and all the players are of the same type. Hence 
X, = X and the value of the epidemic threshold Xc is exactly equal to inverse of the largest 


eigenvalue, i.e. the spectral radius, Xi of the adjacency matrix [18|. In particular, we arrive at 

Xc(v) = ^ 


x—V 

where x > 1 is the number of nodes that do not invest. The special cases where x=\ and x = 0 
are not representative of our problem since there is no propagation effects in these two cases. 
The utility function for a node that does not invest can be simplified into 

K if x>l + i, 

0 otherwise. 

The value x is not known by every node, but it follows a Poisson distribution with rate X. We 
denote by p the solution to the following equation 


Ut{OFF,x) = 




k=0 


k\ 




Xp 


The strategy OFF is dominant if ^ < C as stated in Lemma Q. Next, we investigate the case 
where K > C. Note that if the effective spreading rate is sufficiently large, i.e., x = | > 1, then, 
we can find the equilibrium given by: 


* 1 , 

P = ^log 


K 


K-C 


This solution is not 0 since there is a very small probability that an individual is not infected. 
However, this value becomes close to 0 as X increases. 

Proposition 4: If the effective spreading rate is high enough but not too much, i.e. 1/2 < x < 1, 
then we have the unique equilibrium given by: 

1 


* 

P = 


X 


1 + W 


1 -^ 

K 
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with W{x) is known as the Lambert W function and defined such that x = W(x)e^^^\ 

Proof: Assuming that 1/2 < x < 1 implies that = 1 and then the mixed equilibrium is 
solution bet ween the interval [0,1] of the following equation: 


This equation is equivalent to: 




1 ^ 


This equation is the following transcendental algebraic equation: 

= ao(p-r), 

with the following constants: 


c =—X, ao = -pr, and r—— — . 

1-1 ^ 


Then the solution is given by: 


1 ce 

p* = r + -LambertW ( -), 

c ao 


which gives: 


p* = —T —^LambertWi -^- —) 

A A A 

= — + LambertW {—e~^ {I ——))]. 

A K 


In the case where x < 1/2, we have the following description of the equilibrium p*: 

| 8 | 


P = 


. , p if (1 


(9) 


1 otherwise. 

The solution p is in fact the solution of the so-called Generalized Lambert W function 

Pn{x) 


e-cx^ 


2m W’ 

where c > 0 is a constant and Pn{x) and Qm{x) are polynomials in x of respectively orders N 
and M. Though this equation cannot be solved in its general form (approximations are possible 


for simple cases p5|), it embodies an interesting link between gravity theory and quantum 
mechanics 
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B. Population with Multiple Types 

We consider 2 types of individuals in a population, i.e., T = 2. We then have r(l) := r (i.e., 
the proportion of type -1 individuals in the population) and r( 2 ) := 1 — r (i.e., the proportion of 
type-2 individuals). Individuals from each type differ in their capacity (e.g. recovery rate x) to 
recover from the virus. We let Xi < X 2 , which means that 5i > 82 , i.e., type-1 individuals are 
more resilient to the virus and it takes generally less time for them to react and then to recover. 



Fig. 2. Proportion of individuals protected inside the population at equilibrium depending on the average number of nodes 
A. in the graph and two types of individuals with varying the heterogeneity of the population. The parameters are: Xi = 0.05, 
X 2 = 0.2, C = 4 and a: = 5. 

1) Equilibrium Paradox: When r is close to 0 (resp. 1), i.e., only type-2 (resp. type-1) 
individuals form the population, we observe in Fig. the impact of both heterogeneity parameters 
E and the type distribution r(-) on the equilibrium. Specifically, for the reason of convenience, we 
show the percentage of people protecting themselves (i.e., the protection rate) which corresponds 
to I— p*. We can observe that the average number of interacting individuals, which is equal to 
X, has a positive impact on the protection rate inside the population. For the same heterogeneity 
type given by a distribution r(-), the protection rate is strictly increasing with X. It is obvious 
that, when each individual interacts with more individuals in average, it makes individuals more 
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Fig. 3. Proportion of individuals protected inside the population at equilibrium depending on the proportion r of the two types 
of individuals varying the heterogeneity of the population. The parameters are: A, = 30, I 2 = 0.2, C = 4 and K = 5. 



Fig. 4. Convergence of the replicator dynamics to the Nash equilibrium with 2 types of individuals and the following parameters: 
Ti = 0.05, T 2 = 0.2, C = 4, r = 0.1 and K = 5.. We consider two initial points: p{0) = 0.3 and p(0) = 0.7. 
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Fig. 5. Revenue of the provider depending on the price C with the following parameters Xi = 0.5, X 2 = 0.98, = 10, r(l) = 0.3 

and X = 10. 

vulnerable to the eontagion, and then requires a higher level of proteetion. Comparing X = 2 (e.g. 
a population with mostly pairwise interaetions as in standard evolutionary game framework) and 
X = 10, we ean observe that the proteetion rate is strietly higher only when the proportion of type- 
2 individuals is higher than around 80%. This means that under this threshold type proportion, 
individuals do not feel in a risky environment even if the number of interaeting individuals is 
large (i.e., X = 10). When X beeomes even larger another interesting property arises. From Fig.|^ 
we ean observe that, for parameter X = 20, the impaet of the heterogeneity type is signifieant. By 
inereasing the heterogeneity from r = 0 (only type-2 individuals), the proteetion rate deereases. 
In faet, inereasing the heterogeneity in our seenario means that we reduee the proportion of 
type-2 individuals meanwhile inereasing the proportion of type-1 individuals. 

As we further inerease the value r, the phenomenon of heterogeneity induced confidence 
principle arises. A highly heterogeneous population leads to a deereasing the proteetion rate as 
the heterogeneity reaehes the pereentage value around 20 % of type-2. Moreover, this threshold 
seems be independent of the average number of individuals in eaeh loeal interaetion. 

We ean explain this eounter-intuitive outeome by eonsidering the partieular ease with X = 30 
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and with two different effeetive spreading rate for type-1 individuals. This seenario is depieted 
on Fig. Here, we ean observe that the behavior of the proteetion rate is as expeeted, always 
deereasing when inereasing the proportion of type-2 individuals. This is obtained when the effee¬ 
tive spreading rate of type-1 individuals is equals to 0.1. In the other ease, when Xi = 0.05 even 
if more type-2 individuals are in the population, the global proteetion rate is deereasing as more 
heterogeneity is introdueed. Individuals behave more eonfidently and proteet less themselves. 

2) Convergence of the Replicator Dynamics: In Fig. the replieator dynamies equation is 
illustrated for two different starting points and also for different values of the average interaetion 
size X. This result eonfirms the eonvergenee of the ODE to the equilibrium and also that the 
equilibrium is deereasing with the parameter X. In faet, the proteetion rate is by definition the 
proportion of individuals that are proteeted, i.e., \ —p*. Then, we ean observe on Fig. that 
for r = 0.1 and X = 10, the proteetion rate of the population is 0.13, whieh eorresponds to the 
rest point of the replieator dynamies in long dashed line. This result is eorroborated for the 
ease where X = 20 and the proteetion rate is then 0.56, we obtain that the replieator dynamies 
eonverge to 0.44. 

3) Global optimization and learning algorithm: Finally, we illustrate the global eontrol design 
problem. We first deseribe in Fig. the global revenue for the eontroller as a funetion of his 
eontrol parameter, and observe the striet eoneavity property. 

We show in Fig. [^the result of our learning meehanism for several eontrol updates with step 
size a{n) = ^ and a{n) = respeetively. The first eontrol update satisfies the 

eonditions in Proposition We ean observe that for this eontrol update, our learning algorithm 
eonverges to the optimal eontrol, whereas the two other eontrol updates do not. The last eontrol 
update a{n) = -^ gives a too fast update of the eontrol and finally eonverges to the eontrol value 
whieh is far from the optimal one, and therefore, the revenue at this value is far from the optimal 
revenue. 


VI. Conclusion 

In this paper, we have first deseribed a framework of large population game with heteroge¬ 
neous types of individuals, in whieh loeal interaetions involve a random number of individuals 
of different types. We have developed the eoneept of evolutionary stability equilibrium as a 
solution that eharaeterizes the game behavior. A deeentralized virus proteetion problem has 
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Fig. 6 . Convergence of our learning algorithm with the SPSA to the optimal price with 5i = 10, 82 = 5.1, P = 5, K = 10, 
r(l) =0.3 and A, = 10 . 


been used to motivate and illustrate this framework. In order to aehieve desirable outeome of 
the game, we have developed methodologies to design a global eontroller for this dynamie 
heterogeneous population game. In partieular, the interdependeney between the global eontrol 
and the population behaviors has been analyzed using eoupled dynamics between approximate 
stochastic gradient algorithms with the replicator dynamics. We have shown the convergence of 
our learning algorithm, and provided numerical illustrations to demonstrate the impact of the 
heterogeneity on the outcome of the system. As a future work, we would lift the assumption 
of Poisson distribution on the population, and investigate the data-driven reinforcement learning 
type of algorithms for as a global controller. 
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